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Why  Multi-Core  Processors? 


Processor  development  trend 

•  Increasing  overall  performance  by  integrating  multiple  cores 

Embedded  systems:  Actively  adopting  multi-core  CPUs 

•  Automotive: 

-  Freescale  i.MX6  4-core  CPU 

-  NVIDIA  Tegra  K1  platform 

•  Avionics  and  defense: 

-  Rugged  Intel  i7  single  board  computers 

-  Freescale  P4080  8-core  CPU 
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Shared  Hardware:  Multicore  Memory  System 
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Cache  Interference  Across  Cores 


Bank  Interference  Across  Cores 
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rows 


Impact  of  Memory  Interference 


1  attacker  ->  Max  5.5x  increase 

2  attackers  ->  Max  8.4x  increase 

3  attackers  ->  Max  12x  increase 


We  should  predict,  bound  and 

>-  reduce  the  memory  interference 

delay! 
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Virtual  Pages 


Cache  /  Bank  Partitioning  (Coloring) 

Main  Mem 


Software  Engineering  Institute 


Carnegie  Mellon 


8 


Virtual  Pages 


Cache  /  Bank  Partitioning  (Coloring) 


Main  Mem 


Banks 
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Cache  /  Bank  Partitioning  (Coloring) 

Main  Mem 


Cache  and  Bank  Address  Bits 


Cache  Index 


E.g.  2  bank  bits  <d 

2  cache  bits  % 

1  shared  bit  u 
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Coordinated  Cache  and  Bank  Partitioning 
(Private  Partitions) 

Avoid  conflicting  color  assignments 

Take  advantage  of  different  conflict  behaviors 

•  Banks  can  be  shared  within  same  core  but  not  across  cores 

•  Cache  cannot  be  shared  within  or  across  cores 

Take  advantage  of  sensitivity  of  execution  time  to  cache 

•  Task  with  highest  sensitivity  to  cache  is  assigned  more  cache 

•  Diminishing  returns  taken  into  account 

Two  algorithms  explored 

•  Mixed-Integer  Linear  Programming 

•  Knapsack 
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Experimental  Results 
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SPEC.sphinx3 


Shared  Bank  Partitioning 


Explicitly  considers  the  timing  characteristics  of  major  DRAM 
resources 

•  Rank/bank/bus  timing  constraints  (JEDEC  standard) 

•  Request  re-ordering  effect 


Bounding  memory  interference  delay  for  a  task 

•  Combines  request-driven  and  job-driven  approaches 


Software  DRAM  bank  partitioning  awareness 

•  Analyzes  the  effect  of  dedicated  and  shared  DRAM  banks 
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DRAM  Organization 


DRAM  Rank 


Command  bus 
Address  bus  — 

64-bit 

Data  bus 


DRAM  Chip 


Command 

bus 


Address  bus 
Data  bus 


DRAM  access  latency  varies  depending  on  which  row  is  stored  in  the  row  buffer 
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Memory  Controller 


Request  buffer 


Memory  scheduler 


Two-level  hierarchical 
scheduling  structure 


DRAM  address/command  buses  data  bus 
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Memory  Scheduling  Policy 


•  FR-FCFS:  First-Ready,  First-Come  First-Serve 

-  Goal:  maximize  DRAM  throughput  ->  Maximize  row  buffer  hit  rate 


Bank  1  I  Bank  2 
Scheduler  I  Scheduler 


I 


Channel  Scheduler 


1.  Bank  scheduler 

•  Considers  bank  timing  constraints 

•  Prioritizes  row-hit  requests 

•  In  case  of  tie,  prioritizes  older  requests 

2.  Channel  scheduler 

•  Considers  channel  timing  constraints 

•  Prioritizes  older  requests 


Memory  access  interference  occurs  at  both  bank  and  channel  schedulers 

•  Intra-bank  interference  at  bank  scheduler 

•  Inter-bank  interference  at  channel  scheduler 
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DRAM  Bank  Partitioning 


•  Prevents  intra-bank  interference  by  dedicating  different 
DRAM  banks  to  each  core 
-  Can  be  supported  in  the  OS  kernel 


(1)  w/o  bank  partitioning 


(2)  w /  bank  partitioning 


Intra-bank  and  inter-bank 
interference 


Only  inter-bank  interference 
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Bounding  Memory  Interference  Delay 


Schedulability  Analysis 
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Response-Time  Test 


•  Memory  interference  delay  cannot  exceed  any  results  from 
the  RD  and  JD  approaches 

-  We  take  the  smaller  result  from  the  two  approaches 


Extended  response-time  test 


Classical  iterative  response-time  test 
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Experiment  Severe  Memory  Interference 


•  Private  DRAM  Bank 


4.  lx  increase  DRAM  bank  partitioning 


helps  reducing  the  memory  interference 
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Non-Severe  Memory  Interference 


•  Private  DRAM  Bank 
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Implementation 


Cache  and  Bank  Partitioning  implemented  in  Linux/RK 
•  Associates  Resource  Reservations  to  Linux  Threads 

-  Memory  reservation 

-  Cache  reservation 

-  CPU  reservation 


“Portable”  Kernel  Module 

•  Hooks  into  on-demand  page  allocation 

•  At  boot  time  create  large  memory  reserve 

-  Pages  are  classified  in  cache  and  bank  colors 


Software  Engineering  Institute 


Carnegie  Mellon 


23 


Model  Problem 


Ball-following  Controller 
on  Odroid+Linux/RK 


Memory  interference 


Fixed  distance 

- > 


Ball-Following:  Keep  fixed  distance  as 
ball  moves  around 
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Time  [s] 


Experimental  Results  (1) 
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Frame  Processing  (no  protection,  no  attackers) 
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Experimental  Results  (2) 

Frame  Processing  (no  protection,  3  attackers) 
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Deadlines  misses  at  0.2.  only  195  out  of  279  frames  processed  (30%  loss) 
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Experimental  Results  (3) 


Frame  Processing  (with  protection,  3  attackers) 
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•  Frame  inter-arrival  time  •  Frame  processing  time 


No  deadline  misses.  Processing  below  0.2  s  interarrival  (period) 
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Concluding  Remarks 


Multicore  processor  challenges  previous  results  in  real-time  systems 

•  Interference  from  shared  hardware 

-  Cache,  Memory  banks,  Memory  bus 

Leads  to  less  usable  processing  capacity 

•  1200%  increase  in  a  four  core  machine  (92%  reduction  from  single  core) 
Our  approach 

•  Coordinated  private  partitions  for  cache  and  memory 

•  Shared  bank  partitions 

•  Implemented  in  Linux/RK 


Experimental  results  for  model  avionics  application 
•  Protects  control  algorithm  from  interference 
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